Putumayo Department
Using LLMs to create analytical datasets: A case study of reconstructing the historical memory of Colombia
Anderson, David, Benitez, Galia, Bjarnadottir, Margret, Reyya, Shriyan
Colombia has been submerged in decades of armed conflict, yet until recently, the systematic documentation of violence was not a priority for the Colombian government. This has resulted in a lack of publicly available conflict information and, consequently, a lack of historical accounts. This study contributes to Colombia's historical memory by utilizing GPT, a large language model (LLM), to read and answer questions about over 200,000 violence-related newspaper articles in Spanish. We use the resulting dataset to conduct both descriptive analysis and a study of the relationship between violence and the eradication of coca crops, offering an example of policy analyses that such data can support. Our study demonstrates how LLMs have opened new research opportunities by enabling examinations of large text corpora at a previously infeasible depth.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- South America > Colombia > Bolivar Department (0.04)
- South America > Colombia > Southwest Colombia (0.04)
- (7 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Government > Military (1.00)
- Government > Regional Government > South America Government > Colombia Government (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Unsupervised Episode Detection for Large-Scale News Events
Kargupta, Priyanka, Zhang, Yunyi, Jiao, Yizhu, Ouyang, Siru, Han, Jiawei
Episodic structures are inherently interpretable and adaptable to evolving large-scale key events. However, state-of-the-art automatic event detection methods overlook event episodes and, therefore, struggle with these crucial characteristics. This paper introduces a novel task, episode detection, aimed at identifying episodes from a news corpus containing key event articles. An episode describes a cohesive cluster of core entities (e.g., "protesters", "police") performing actions at a specific time and location. Furthermore, an episode is a significant part of a larger group of episodes under a particular key event. Automatically detecting episodes is challenging because, unlike key events and atomic actions, we cannot rely on explicit mentions of times and locations to distinguish between episodes or use semantic similarity to merge inconsistent episode co-references. To address these challenges, we introduce EpiMine, an unsupervised episode detection framework that (1) automatically identifies the most salient, key-event-relevant terms and segments, (2) determines candidate episodes in an article based on natural episodic partitions estimated through shifts in discriminative term combinations, and (3) refines and forms final episode clusters using large language model-based reasoning on the candidate episodes. We construct three diverse, real-world event datasets annotated at the episode level. EpiMine outperforms all baselines on these datasets by an average 59.2% increase across all metrics.
- North America > Haiti (0.14)
- Asia > China > Hong Kong (0.07)
- North America > United States > Illinois (0.05)
- (22 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Government (1.00)
- Media > Television (0.94)
- Leisure & Entertainment (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Predicting the Geothermal Gradient in Colombia: a Machine Learning Approach
Mejía-Fragoso, Juan Camilo, Florez, Manuel A., Bernal-Olaya, Rocío
Accurate determination of the geothermal gradient is critical for assessing the geothermal energy potential of a given region. Of particular interest is the case of Colombia, a country with abundant geothermal resources. A history of active oil and gas exploration and production has left drilled boreholes in different geological settings, providing direct measurements of the geothermal gradient. Unfortunately, large regions of the country where geothermal resources might exist lack such measurements. Indirect geophysical measurements are costly and difficult to perform at regional scales. Computational thermal models could be constructed, but they require very detailed knowledge of the underlying geology and uniform sampling of subsurface temperatures to be well-constrained. We present an alternative approach that leverages recent advances in supervised machine learning and available direct measurements to predict the geothermal gradient in regions where only global-scale geophysical datasets and course geological knowledge are available. We find that a Gradient Boosted Regression Tree algorithm yields optimal predictions and extensively validate the trained model. We show that predictions of our model are within 12% accuracy and that independent measurements performed by other authors agree well with our model. Finnally, we present a geothermal gradient map for Colombia that highlights regions where futher exploration and data collection should be performed.
- South America > Ecuador (0.28)
- South America > Guyana (0.14)
- South America > Colombia > Putumayo Department (0.14)
- (25 more...)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development (1.00)
- Energy > Renewable > Geothermal > Geothermal Resource Type (0.68)
- South America > Colombia > Putumayo Department (0.05)
- Oceania > New Zealand (0.05)
- Europe > France (0.05)
- Africa > Democratic Republic of the Congo (0.05)
Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data
Ullo, Silvia L., Langenkamp, Maximillian S., Oikarinen, Tuomas P., Del Rosso, Maria P., Sebastianelli, Alessandro, Piccirillo, Federica, Sica, Stefania
In this paper, the authors aim to combine the latest state of the art models in image recognition with the best publicly available satellite images to create a system for landslide risk mitigation. We focus first on landslide detection and further propose a similar system to be used for prediction. Such models are valuable as they could easily be scaled up to provide data for hazard evaluation, as satellite imagery becomes increasingly available. The goal is to use satellite images and correlated data to enrich the public repository of data and guide disaster relief efforts for locating precise areas where landslides have occurred. Different image augmentation methods are used to increase diversity in the chosen dataset and create more robust classification. The resulting outputs are then fed into variants of 3-D convolutional neural networks. A review of the current literature indicates there is no research using CNNs (Convolutional Neural Networks) and freely available satellite imagery for classifying landslide risk. The model has shown to be ultimately able to achieve a significantly better than baseline accuracy.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- South America > Colombia > Putumayo Department > Mocoa (0.04)
- North America > United States > California (0.04)
- (8 more...)